Skip to content

Goldilocks#210

Open
TomWambsgans wants to merge 68 commits into
mainfrom
goldilocks
Open

Goldilocks#210
TomWambsgans wants to merge 68 commits into
mainfrom
goldilocks

Conversation

@TomWambsgans
Copy link
Copy Markdown
Collaborator

No description provided.

TomWambsgans and others added 30 commits April 15, 2026 18:18
Co-authored-by: Copilot <copilot@github.com>
w
Co-authored-by: Copilot <copilot@github.com>
Bring main's MTU-XMSS structure (tweak table, public_param, T-Sponge with
replacement) into the goldilocks branch with all poseidon-related sizes
halved:

  field-element widths    main (KoalaBear)   goldilocks
  ------------------    -----------------   ----------
  TWEAK_LEN                 2                 1
  XMSS_DIGEST_LEN           4                 2
  RANDOMNESS_LEN_FE         6                 3
  MESSAGE_LEN_FE            8                 4
  PUBLIC_PARAM_LEN_FE       4                 2
  POSEIDON1_WIDTH          16                 8
  DIGEST_LEN_FE             8                 4

Tweak table slots are 2 FE (1 actual tweak FE + 1 zero pad). The packed
tweak fits in a single 64-bit Goldilocks element via
`(tweak_type << 42) | (sub_position << 32) | index`.

Port main's poseidon precompile features (`half_output`,
`hardcoded_offset_left`) from Poseidon16 to Poseidon8, with new committed
columns for the flags and `effective_index_left_first/second`. The
half-output trace tail values are filled in a post-pass from
`memory_padded` (lookup-only — the AIR doesn't constrain them).

Encoding decomposition uses the goldilocks-proven 21 chunks of W=3 bits
per FE with a factored 1-bit canonical check
`(diff)·(diff − 2^63) == 0`, applied to the first 2 of 4 output FE for
exactly V = 42 chunks (no V_GRINDING).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TomWambsgans and others added 7 commits May 21, 2026 15:03
Conflicts resolved by adopting main's BusInteraction refactor (PR #228)
while keeping Goldilocks-specific bits:
- Ported poseidon_8/mod.rs to the new bus_interactions() API (renamed
  COL_FLAG→COL_MULTIPLICITY, COL_PRECOMPILE_DATA→COL_DOMAINSEP,
  POSEIDON_PRECOMPILE_DATA(=1)→POSEIDON_DOMAINSEP_BASE(=3); merged
  lookups() + bus() into bus_interactions()).
- table_enum: kept Table::poseidon8(), adopted MAX_BUS_WIDTH/LOG_MAX_BUS_WIDTH.
- extension_op air.rs: virtual cols at indices 21,22 (DIMENSION=3) with
  new names COL_MULTIPLICITY/COL_DOMAINSEP_EXTENSION_OP; switched to
  eval_bus_virtual.
- verify_execution: kept *get_poseidon8(), added MAX_BYTECODE_LOG_SIZE check.
- Renamed poseidon_16 references → poseidon_8 in test (prove_poseidon_8.rs).
- Removed the spurious poseidon_16/ directory left in the tree.
- Adjusted recursion.py to use copy_ef instead of copy_5 (DIM=3).
# Conflicts:
#	crates/lean_vm/src/tables/extension_op/air.rs
#	crates/lean_vm/src/tables/poseidon_16/mod.rs
Adapt the always-IV slice-hashing scheme (length absorbed into the IV
for domain separation) to the Goldilocks Poseidon8 permutation
(width 8, rate 4, digest 4) instead of KoalaBear Poseidon16
(width 16, rate 8, digest 8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TomWambsgans and others added 5 commits May 26, 2026 01:46
# Conflicts:
#	crates/lean_compiler/snark_lib.py
#	crates/lean_compiler/tests/test_compiler.rs
#	crates/lean_compiler/tests/test_data/program_166.py
#	crates/lean_compiler/zkDSL.md
#	crates/rec_aggregation/zkdsl_implem/hashing.py
#	crates/rec_aggregation/zkdsl_implem/main.py
# Conflicts:
#	crates/lean_prover/src/lib.rs
#	crates/lean_prover/src/test_zkvm.rs
# Conflicts:
#	crates/backend/fiat-shamir/src/challenger.rs
#	crates/backend/fiat-shamir/tests/grinding.rs
#	crates/backend/sumcheck/src/product_computation.rs
#	crates/lean_prover/src/verify_execution.rs
#	crates/rec_aggregation/src/bytecode_claims.rs
#	crates/rec_aggregation/src/type_2_aggregation.rs
#	crates/rec_aggregation/zkdsl_implem/fiat_shamir.py
#	crates/rec_aggregation/zkdsl_implem/main.py
#	crates/sub_protocols/src/quotient_gkr/mod.rs
#	crates/utils/src/wrappers.rs
#	crates/whir/tests/run_whir.rs
Adapt main's column/flag renames (e.g. POSEIDON_*COL_INDEX_INPUT_LEFT ->
POSEIDON_*COL_NU_A, EXT_OP_FLAG_MUL -> EXT_OP_FLAG_DOT_PRODUCT,
ExtensionOp::PolyEq -> Eq, COL_COMP -> COL_ACC, etc.) to the
goldilocks-specific code that uses Poseidon8 and cubic (DIMENSION=3)
extension. Drop the KoalaBear-targeted python verifier and its
check_whir_configs test, which don't apply to the goldilocks branch
(folding_pow_bits was removed in goldilocks; WHIR_CONFIGS and Fp
primitives are KoalaBear-specific).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TomWambsgans TomWambsgans force-pushed the main branch 2 times, most recently from c5a3050 to 9dc5d68 Compare May 28, 2026 12:02
TomWambsgans and others added 8 commits May 29, 2026 01:06
Adopt main's overwrite (permutation-based) sponge hashing on the Goldilocks
branch, keeping Goldilocks field types throughout (WIDTH 8, RATE 4, DIGEST 4,
poseidon8, cubic extension).

Key reconciliations:
- utils/symetric/whir/fiat-shamir: overwrite sponge (hash_slice_rtl,
  precompute_zero_suffix_state), poseidon_hash_slice, two-perm merkle_verify.
- Poseidon table: kept the Goldilocks x^7 permutation AIR (sparse partial
  rounds) but adopted main's I/O interface, halved: 3-way output gating via
  flag_out2/flag_out4, added permute_half, unified Davies-Meyer output gates.
  New precompile set: poseidon8_compress_half/_quarter (+_hardcoded_left),
  poseidon8_permute/_permute_half (+_hardcoded_left).
- Compiler, instruction encoder/display, prover trace post-pass updated to the
  new flags and names.
- zkDSL verifier (hashing.py, main.py, xmss_aggregate.py) and XMSS signer
  (wots.rs) switched to the overwrite sponge; encoding decomposition and
  gl-specific constants (copy_ef/copy_digest, NUM_ENCODING_FE, strides) kept.
- Dropped python-verifier (removed on goldilocks).

Validated: workspace builds, fmt+clippy clean, poseidon AIR proves/verifies,
compiler + lean_prover + xmss + sub_protocols tests pass, aggregation bytecode
compiles, and end-to-end recursive XMSS aggregation proves+verifies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adopt main's rec_aggregation rename (type 1/2 -> single/multi-message) and the
new bytecode-claim Fiat-Shamir handling, keeping Goldilocks field types/sizes.

Reconciliations:
- bytecode_claims.rs: adopt main's direct claim ingestion into Fiat-Shamir
  (build_bytecode_claims_ingested_by_fiatshamir + observe_scalars, dropping
  hash_bytecode_claims), but keep Goldilocks poseidon8 (get_poseidon8).
- multi_message_aggregation.rs: adopt build_multi_message_input_data name,
  keep DIGEST_LEN-generic layout comment.
- zkdsl main.py: adopt single/multi-message naming and main's direct-ingestion
  reduce_bytecode_claims, but keep Goldilocks DIGEST_LEN-generic copy_digest
  loops (instead of main's hardcoded copy_8/copy_32) and SINGLE_MESSAGE flag
  placeholders provided by compilation.rs.
- zkdsl hashing.py: slice_hash_continue uses poseidon8_permute_half (not the
  KoalaBear poseidon16 variant that auto-merged in).

Validated: workspace builds, fmt+clippy clean, cargo testall passes, and
end-to-end recursion (n=2) proves+verifies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…idon1-8

Implemented and measured the Appendix-B sparse partial-round decomposition
(the same one the AIR/trace-gen and the KoalaBear-16 permutation use) in the
AVX-512 permutation. It is ~13% slower for Goldilocks: this circulant MDS has
tiny entries {1,3,4,7,8,9} that strength-reduce to shifts/adds and batch 8
terms into a single reduce128 per output, while the sparse form needs
arbitrary-constant 64x64 multiplies (one reduce128 each → 15 vs 8 reductions
per partial round). Reverted the implementation, kept a comment so the dead
end isn't re-explored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cubic_mul_generic is the hottest field op in the prover after the poseidon
permutation (~15% of an xmss prove, via sumcheck eq-eval and the poseidon AIR
constraint eval). On Goldilocks each multiply carries a 128->64-bit reduction
(the dominant cost), so 3-term Karatsuba trading 3 of the 9 multiplies for
cheap field adds/subs is a net win across all packed backends.

Measured: xmss --n-signatures 1550 --log-inv-rate 1 goes 392-394 -> 400-402
XMSS/s (~2%). Verified against the schoolbook reference (10k scalar + 2k packed
random inputs) and end-to-end recursion still proves+verifies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring in main's 13 post-merge-base commits (python-verifier cleanup,
zkDSL compiler fixes, doc fixes, generic perf opts) while preserving the
goldilocks field migration (Goldilocks + cubic extension, Poseidon8,
128-bit security, folding-pow-grinding removed).

Conflict resolutions:
- backend/air (lib.rs, constraint_folder/packed.rs): keep goldilocks's
  removal of the degree-split low_degree_block/skip_low machinery; take
  main's #[inline(always)] on the methods goldilocks keeps.
- backend/sumcheck/product_computation.rs: keep goldilocks's retained
  compute_product_sumcheck_polynomial_base_ext_packed (main deleted it as
  dead; goldilocks keeps it for a planned Goldilocks optimization).
- whir/merkle.rs: apply main's "remove Matrix trait" refactor (DenseMatrix
  -> Matrix struct, no M generic) onto goldilocks's Goldilocks/Poseidon8
  field logic.
- lean_vm isa/instruction.rs + tables/poseidon/mod.rs: keep goldilocks's
  POSEIDON8 naming; adopt main's intent of a generic table name ("poseidon8").
- Adopt main's no-loop-carried-mutables rule: rewrite goldilocks's
  soundness_4/5 and the ALL_PRECOMPILES_PROGRAM counter loop to buffers.
- rec_aggregation/zkdsl_implem/whir.py: take main's buffer-carry structure
  with goldilocks's copy_ef/DIM and 5-value get_whir_params (no folding
  grinding); drop main's sumcheck_verify_with_grinding.
- Drop the KoalaBear-specific python verifier (primitives.py, verifier.py,
  test_verify.py, check_whir_configs.rs) and its test-vector dumping in
  test_zkvm.rs; restore goldilocks's coherent Goldilocks test program.

cargo fmt / clippy / testall all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant